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Abstract 


Replication is & well knows technique to achieve high availability of data in a dis- 
tributed system. Due to partition failures, the network may split in to a set of groups. 
The transactions getting performed simultaneously in these groups might cause in- 
consistency of replicated data. Voting is a commonly used approach to maintain 
consistency of replicated data. In this approach, each node k assigned a particular 
number of votes, and any group with majority of votes can perform critical opera- 
tions. The number of votes assigned to the nodes can have a significant impact on 
the system performance. In this thesis, we propose an integer programming approach 
for the vote assignment problem using average transaction gain per unit lime as the 
performance metric. We use monte-carlo simulation to find the most likely groups 
formed due to partition failures and the average transaction gain in them per unit 
time. We use these groups and obtain the vote assignment using integer program- 
ming. We suggest three integer programming formulations for the vote assignment 
problem. Unlike the heuristics proposed in the literature, this approach uses global 
information to determine the vote assignments. We have tried this approach for dif- 
ferent networks and it is observed that hi many of the cases this approach is assigning 
votes equivalent to or better than the best vote assignment given by the heuristics. 
Based ou our experiments, we feel that this approach will result in a better vote 
assignment when the most likely groups formed due to partition failures are few in 


number. 



Contents 


1 Introduction 1 

1.1 Need fax' optimal vote assignment 3 

1.2 Our approach 4 

1.3 Outline of the thesis 5 

2 Related work 6 

3 Problem formulation 9 

3.1 Selecting Network Partitions 10 

3.2 Integer programming formulations 12 

3.2.1 Notations 14 


15 


3.2.2 Formulation 1 



3.2.3 Formulation 2 IB 

3.2.4 Formulation 3 20 

4 Implementation of VAT 2G 

4.1 input format 27 

4.2 Output format 29 

4.2.1 Normal mode . 29 

4.2.2 Verbose mode 30 

4.3 Simulation model 31 

4.4 Integer programming 34 

4.5 Random graph generator 35 

5 Experiments 36 

5J Experiment J 36 

5.2 Experiment 2 40 

6 Concliiftions 4 % 

References 44 



Chapter 1 


INTRODUCTION 


Distributed computer system is characterised by the presence of a number of process- 
ing unite connected through a communication network. The availability of data in 
the system depends highly upon the reliability of the site at which it is stored. By 
replicating critical data at multiple sites, which have independent failure modes, the 
probability of the data being accessible even in presence of node and link failures can 
be increased [CD 88, PN 86, PB 85]. The presence of data at multiple rites can also 
increase the response time of the system. 

However, data replication requires that sites coordinate within themselves for 
accessing the data, such that the overall view of any process is as if there was a 
■ingle copy in the system. This requires replica control protocols. A replica control 
mechanism has to ensure that a consistent view of the data is offered to user processes 
even in the face of node failures and network partitioning. 
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There are various methods Cur controlling, access to replicated data [DGB 85]. One 
of the well known strategy is the majority voting [Tfl 79]. In majority voting, each 
site is assigned & particular number of votes. A node which wants to perform an 
update on the data must collect a majority of votes before performing the operation. 
In otherwords a set of nodes which have majority of votes should agree far this 
operation to proceed. Since only one of the groups formed during a partition failure 
can get majority of votes it satisfies the mutual exclusion criterion even under network 
partitioning. 

A generalisation of majority voting is the weighted voting approach [GI 79]. In 
weighted voting a site is permitted to read only if it has obtained a read quorum of r 
votes and is allowed to write only if it has obtained a write quorum of w votes. For 
doing any read qpeartion the node will read the replicas at all the nodes which have 
sent these votes and takes the most recent copy (using version vectors [PR 83]) and to 
update it updates all the replicas present at these nodes. Read quorum (r) and write 
quorum (w) are such that w > N/2 and r-l w > N, where N is the total number of 
votes in the system. The above assignment of quorums will make sure that every read 
quorum and every write quorum will have a nan-null intersection and also every write 
quorum intersects with other write quorums. Non-null intersection between read and 
write quorums will makes sure that the value read will be the recent one and non- 
null intersection between write quorums will make sure that mutual exclusion criteria 
is satisfied. Many variations of the weighted voting technique have been proposed 
[RT 88, CMM 90, JP 86, JM 87, DD 89, EGA 86]. 
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Figure 1.1: Sample net-work topology 


1.1 Need for optimal vote assignment 


The performance of the system using weighted voting depends critically on how 
the votes axe assigned. For example, let ns consider the network shown in Fig. 1.1. 

Let ns assume uniform vote assignment (1 vote to each node) and let read quorum 
= write quorum = 3. When node 3 and link (1,4) fail none of the groups can perform 
operations. If these two fail very often the system throughput will go down drastically. 
If node 4 and 5 are relatively more reliable than 1 and 2 then one of the following 
vote assignments (1,1, 1,3,1) or (0,0, 0,1,0) could have given better throughput. 

Different votes assigned to the nodes will give priorities to the nodes in transaction 
processing [MM 89]. Giving all votes to a single node (singleton voting) is similar to 
the well known primary copy [AD 76] approach. Whatever be the vote assignment, 
it might always lead to a halt state of the system where system can not perform any 
operation due to non-availability of sufficient votes to obtain a quorum in any of its 



giouptt. 


Since halt states will reduce the performance of the system, the vote assignment 
should be such that it reduces that possibility. H we consider a general network with 
N nodes the number of possible vote assignments u 2** fGB 85]. So it is not feasible 
to enumerate the vote assignments for big networks. Some other methods are needed 
for assigning voten to nodes in a network. 


1.2 Our approach 

In this thesis, we propose an approach for assigning votes, which is baaed on integer 
programming. It is an engineering approach, to the vote assignment problem leading 
to approximate solutions. For a network, first the top k partitions are determined 
(using simulation), where k is a parameter and is chosen such that most of the likely 
partitions are included. With these k partitions, the vote assignment problem is 
formulated as an integer programming problem, with the objective of minimising the 
average number of transactions that cannot be satisfied due to partitions. So the 
metric that we use for evaluation is the average number of transactions “lost** due to 
partitions per unit time. 

The entire procedure has been implemented, in the Vote Assignment 'ibol (VAT). 
VAT takes as input the topology of the network, along with failure probabilities. It 
performs simulation and formulates the problem «* an integer programming problem, 
and solves it using known techniques for integer programming, lire final output of 



VAT lb the vote assignment, together with performance (lata of the vote uwgDmeat. 


1.3 Outline of the thesis 


lire rest of the thesis is organised as follows. In the next chapter, we briefly discuss 
the different types of vote assignment approaches proposed in the literature. In 
Chapter 3, we discuss our approach in detail. Here we give three integer programming 
formulatio ns for vote assignment problem. Chapter 4 deals with the implementation 
<fot.ru 1« of Vote Assignment Tool (VAT). In chapter 5, we give the experimental results 
o btaine d ruing VAT. In chapter 6, we offer some c onclus ions. 



Chapter 2 


RELATED WORK 


The group of nodes which should agree for the operation to proceed is called the quo- 
rum group for that operation. In weighted voting scheme, quorum groups are farmed 
dynamically by collecting votes from different nodes. There is another approach pro- 
posed in [GB 85] tailed coteries. Here quorum groups (which may be exponential in 
number) are maintained explicitly. Let U be the set of nodes in the system and C be 
a set of groups. C is a coterie iff — 


1 . If Si <E C then £, ~4- 0 and Si C V. 

2. If S„ Sj € C then S, n Sj ? 0. 

3. If Si € C then for all Sj C S„Sj C. 

Here for any operation to take place a node should get consent from all nodes 
present in alleast one of the groups. Since there is a non-null intersection among 
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the groups in coterie, the mutual execution constraint is «aljiifio(). Coterie will give 
b«tt*T reliability than voting. As is shown in. [OB 85], every vote assignment can be 
represented by using a coterie but the reverse is not true. It was also proved that 
getting the best coterie is much more complex than getting the best vote assignment. 
Enumeration algorithm was proposed in it, which gi ves all coteries corresponding to 
vote assignments but only a subset of ND-coteries. This algorithm is not feasible for 
big networks (>5 nodes). There were several variations of this concept proposed by 
Maekawa [MK 85], Tripatlii [SK 91] et al. But all of these approaches Buffer from the 
drawback that the group* explicitly stored may not be the best one and also there is 
an overhead involved in maintaining the quorum groups. 

Garaar-mriHua and Barbara [BG 87, BG 84] have proposed a number of heuristics 
with different types of metrics, for general networks. These heuristics determine the 
votes to be assigned to a node based on the failure probability of tbe nodes and the 
links. They use local information, failure probability of neighbouring nodes /links, 
in determining vote assignment. For example, in one of the heuristics the votes of 
a node are proportional to the product of its reliability and sum of reliabilities of 
all the links incident to it. Tang and Rain [TK 91] have proposed algorithms for 
optimal vote assignment for networks in which links do not fail. They have used a 
logarithmic Junction ]og(j^) where pi is tbe probability of node i to be up. Since 
links do not fail., with this vote assignment it can. be proved that a group G; will 
get majority of votes iff probability of occurence of £?,• is greater than, probability of 
occurence of <5,, which is & key factor in getting optimal vote assignment. They have 
also proposed a heuristic for general networks. Tang and Natarajan |TN 89] have 
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proposed an approach to get the optima], acceptance Beta (equivalent to coteries) by 
formulating this as a sparse integer programming problem, bnt this works only for 
small systems because for large systems the constraints becomes very large making it 
very difficult for integer programming solution. 

In our approach, we consider static vote assignment for a general network where 
nodes and links can fail. There were different metrics proposed for evaluating vote 
assignment, such as probability that the system is not in halt state [BG 87], node vul- 
nerability [BC 84] etc. We consider the metric as the average number of transactions 
that are lost due to node failures and network partitions per unit time. A request 
arriving in a partition not having the necessary quorum, is considered as lost. 

We take an engineering approach to solving the problem of vote assignment. The 
top k partitions of the network, are considered and then the problem ib framed as an 
integer programming problem, with the goal of minimising the operations lost. We 
have designed a tool, called Vote Assignment TboL (VAT), that performs all these 
steps, and given the network configuration, outputs a possible vote assignment. 
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Chapter 3 


PROBLEM FORMULATION 


We model a distributed system as an undirected graph where each node represents 
a computer and each edge a communication link. Data is replicated at oil the nodes 
present in the network. We assume that there is an underlying concurrency contra) 
mechanism which takes care of consistency criteria (such as senalisalility [BH 87]) 
during concurrent requests. We assume that all nodes in the network have same 
processing power. 

Both nodes and links can fail. Failures might cause partitioning of the network. 
We assume that all the failures are statistically independent. It is assumed that all 
the processors are fail-stop (SS 83], they fail hy stopping to function (no hysantine 
failures [LSP 82]) and there are no transmission errors. We do not distinguish between 
node failures and network partitions; a node failure appears as a network partition, 
with one partition having only the failed node. Wc assume that the life and repair 


rales of nodes and links are exponentially distributed. Li our model we work with the 
steady state probabilities of the node/hnk to be up. We assume that the following 
parameters are known about the network: 

• Pi = steady state probability for the node j to be up 

• l, = steady state probability tor the link i to be up 

• A* = arrival rate of requests at a node i 

We consider the average number of transactions that axe performed by the system 
per unit time as the performance criterion. The vote assignment should be such that 
it maximises the average number of transactions performed by the Bystem per unit 
time. The basic idea is that when a network gets partitioned, the vote assignment 
should be such that the group that has the largest number of transactions coming in 
should get the majority. Since there are many possible partitions, we have to select a 
vote assignment such that fox the majority partitions the sum of the average number 
of transactions that get performed per unit time is maximum. 

3.1 Selecting Network Partitions 

Since the reliabilities of the nodes and links is known monte-carlo simulation [ND 79] 
is performed to see bow the network gets partitioned, due to these failures. As the 
arrival rates at the nodes is known, transactions that will get done in each of the 
groups formed if they were assigned majority of votes can be calculated. 



For a group G; : 


Possible transaction gain in G, per unit time 
TR.GAIN JUTE(G i ) = E j60 . X j 

Average possible transaction gain in G» 

AVG-TRjGAIN(Gj) = (TRjGAIN -RATE{G,) * FT,) 

where FT, is the fraction of total time for which group G, is present. 


A simple strategy could have been to find the average possible transaction gain 
in each of the possible groups and assign votes such that the average transaction 
gain is maximum. Since the possible set of groups formed due to partition failures 
is exponential in N, this will lead ns to keeping track of exponential set of groups. 
Hence, it will not be possible to nse this approach for large networks. 

Whatever be the vote assignment only one of these groups formed at any instance 
of time can perform restricted operations such as updates. So it will be better if we 
assign majority of votes to the high priority groups formed during simulation time. 
We define the highest priority group (HPG) as a group which has maximum potential 
for transactions that will get done if it was assigned majority of votes. Let C be the 
set of groups formed due to a partition failure, HPG can be defined as : 

HPG € C such that TR-GAIN _RATE(HPG) > TRjGAIN JRATE(G i ) where G, 6 
C. 
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In oui modelling, we keep track of only the UPC? and group next to IIPG (NI 1 PG) 
groups. NHPG € 0 snch that TRjGAJN -RATE (NHPG) > TRjGAIN -RATE (<?*) 
and NHPG 7 * HPG where G, G C. 

We find out the AVG_TRjGAIN(G,) for all G, when they are present bb HPG or 
NHPG. Since it. is assumed that the big groups formed due to network partitions are 
less, there won’t be ranch overhead in keeping track of the transactions that will be 
done in these groups. The groups which are present as NHPG are also considered to 
take care of a situation in which some groups might be present as one of the high 
priority groups but they are present as HPG for only a short duration. We have 
observed that the AVG-TRjGAIN wifi give better priorities when we consider both 
HPG, NHPG groups. An ideal case could have been to take AVG-TR.GAIN whenever 
the group is present. But as it is shown above that it will lead us to keeping track of 
exponential set of groups. 

Through sunniatkm we get a set of groups with possible total transaction gain in 
each group when it is present as HPG or NHPG. 


3.2 Integer programming formulations 

It is easy to observe that it may not always be passible to assign majority of votes to 
all these groups. Now our aim will be to mwign votes in sucli a way that the groups 
getting majority of votes have maximum sum of AVG_TH_CtA) N . We have formulated 
this as an integer programming problem [HS 89]. As it is dear from the formulations 


12 



given below, the number of cunetnuntH ait: proportional to (Jie number of groups 
far which we nre trying majority vote assignment, Integer programming problems 
complexity is exponential in both the number of variables and number of constraints 
pKM 84]. So it is not possible to give large set of groups. For our experiments, top 20 
groups arc given to the integer programming formulation. The number of groups can 
be changed, however the number of groups considered should not be very high due 
to the NP-completenen of the integer programming problem. Ab we have assumed, 
it may not always be the case that the most likely big groups formed are very few. 
When the average transactions gain in the groups other than top 20 is less than 
of that of the top most group, we assume that they have negligible effect on the 
average transaction gain in the system. Otherwise amission of the groups other than 
the top 20 groups might be costly. When this is the case, VAT selects the best cmc out 
of singleton or uniform or vote assignment done by integer programming formulation, 
using average transaction gain in the system per unit time as the metric. 

Wc propose three integer programming formulations for doing this. In (annular- 
tions 1 and 2 we select the best subset out of these k (k — 20) groups which will get 
majority votes, such that the sum of average transactions gained in them is maximum. 
If there are a number of possible vote assignments which can do the same then these 
fiorimilatiana will select the one with minimum total votes assigned. In formulation 1 
each of the k groups is given as a constraint to the integer programming routine. In 
formulation 2, first the intersecting groupe are found and then integer programming 
solution is tried. The third formulation uses one of the above two foranil&ticms to 
find the groups which should get majority of votes, if there are a number of vote 



assignments possible then it will select tbe one winch assigns votes tdiinlai to the one 
assigned by the modified version of heuristic proposed in (BO 87]. Since it is a better 
one of the heuristics proposed we have rued it. Any other heuristic can be tried in 
place of this. Formulation 3 will give better vote assignment but it is more complex 
than the other two. 

Before giving the mathematical forms for each of them, we shall explain the no- 
tations used. For each formulation, first we give the mathematical forms followed by 
an example to clarify them. 

3.2.1 Notations 

N Total number of nodes in the network 

k Number of groups selected by the simulator module for integer program- 
ming solution. 

Gi i’th group selected by the simulator 
Average transaction gain in group G, 

Xi binary variable 

d a large number 



3.2.2 Formulation 1 


Through formulation 1, we will get a vote assignment which. majority of votes 

to & subset of k groups for which sum of Ti is maxmnun. If there are a number of 
vote assignm ents possible it will select the one which has minimum number of tots] 
votes assigned. 

As mentioned above, X< is a binary variable. Xi will be zero if group G* has 
majority of votes and Xi will be one otherwise. We let d be a large number, such that 
it is greater than majority of votes to be assigned. The integer programming model is: 


Mini mine: 

EPS**) («) 

kl 

subject to 

-'£V i < d*X t , (i - 1 ,... ,V)(3.2) 

1 he(3i 

Xi — 0 or 1 (» — (3,3) 


Since Xi equals aero represents the i’th group getting majority of votes, Ti * X t 
will represent the average number of transactions lost due to i’th group not getting 
majority of votes. The objective function (eq. 3.1) is trying to minimise, the sum of 


average number of transactions lost due to the groups not getting majority of votes 

<y* vi)+i 

which is our aim- The first term in constraint (eq. 3.2) represents majority 


of votes and the second term Vk represents the votes assigned to group <7*. 



cane 1 When X» is sero constraint (3.2) will become 

(3.4) 

1 *eo t 

which makes sure that i’th group will have votes greater than or equal to raar 
joiity of votes. 

case 2 When X, is one constraint (3.2) becomes 

- - E V * < d (3.5) 

fc€«; 



The value of d is chosen such that when X+ is one, whatever be the votes assigned 
to group <3> the constraint (3.2) is still satisfied. As we can see from (eq. 3.5), the 
value an left hand side will be at the most majority of votes (since votes arc positive). 
So value of d is chosen greater than the possible majority of votes to satisfy constraint 
(3.2). This also means that constraint (3.2) may assign majority of votes to group 0\ 
when X i is one. We have to make sure that whenever Xi is one the i’th group will 
not. have majority of votes. Since the objective function is trying to minimise 'i\ * Xi 
it will always try to assign all Xi*s as zero. By case 1, Xi = 0 means G \ gets majority 
of votes. If it is not passible to assign majority of votes to all the groups then some 
s will have to take the value one. The objective function will select a subset out 
of the 1c groups such that the sum of T; for these groups is maximum and assign the 
corresponding Xi as zero. Constraint (3.2) will force these groups to get majority of 
votes. All the other groups which have corresponding X, as one doesn’t get majority 
of votes, since the objective function would have assigned Xi as zero if they can get 
majority of votes. For our experiments wc have chosen d as 500. 


Example: 


Let us consider a 3 node fully connected network, each component with a reliability 
of 0.9, let request arrival rates at all the nodes be 200 per writ time and let k — 3. 
The top 3 partitions are found to be (using simulation) {1,2,3}, {1,2} and {2,3}. The 
average transaction gain in each of the groups are 400, 30 and 30 respectively. The 
above formulation will result in: 


Minimise: 

400X, + 30J* 3 + 30* 3 


subject to: 


-Vi -V 7 ~v 3 - 1000*! < -1 


-Vi - V* + V s - 1000X* < -1 


Vj — V 2 — V 3 — lOOOA^s < — 1 


Xj | X 2} X 3 — 0 cm' 1 


This will result in a vote assignment of (0,1,0) 



S.2.8 Formulation 2 


For vote mini gn me at to exist a necessary (but not aui&cient) condition is that all the 
groups should intersect, with each other. The integer programming algorithm might 
take more tune to realise this. So we can make the formulation, mor e efficient by 
filtering the const Taints with the above rule. 

It can be shown that finding the best possible set of groups which intersect with 
each other is NP-campiele (it is reducible to graph diqnc problem [KM 84]). The 
problems complexity is exponential in the number of groups. Since the total number 
of groups we are considering fit constant (20), it doesn’t become too much overhead. 
The time taken by formulation 1 to find the intersecting groups is found to be much 
more than time taken by this algorithm. We have implemented an algorithm which 
gives the best passible subset of the gronpe which intersect in the decreasing order of 
average transaction gain. We try for integer programming solution with these groups 
using the mathematical form shown below. If there is no solution with these groups, 
we try for the next best set of groups which intersect and so cm (there may not be vote 
assignment possible for intersecting groups eg. coterie).' The integer programming 


model is: 


Minimise; 


( 3 - 6 ) 

*=1 

Subject to: 

£ Vy > — ’[~ )41 ,(»- - 1 *') (3.T) 

where i' — intersecting groups of the k groups. 

Since Vf represente the -votes assigned to node i, objective function (eq. 3.6) is 
trying to nunimue the total number of votes assigned. Constraint (eq. 3.7) will make 
sure that all the groups (intersecting groups found above), are getting majority of 
votes. So the above formulation, will give a vote assignment, if it exist, with minimum 
number of votes assigned such that all the groups given to the formulation will have 
majority of votes. Tf there is no vote assignment possible, then the previous algorithm, 
is rued to get the next best set of intersecting groups. This process is continued till 
a vote assignment is obtained. 

Example: 

Let t» consider the same network given in the previous example. The top groups 
are found, to be {1,2,3}, {1,2} and {2,3}. The average transaction gain in each of the 
groups arc 400, 30 and 30 per unit time respectively. The intersection finding module 
will give all the three groups as intersecting groups. The founuiatian will result in: 

Minimise: 


H~ Va T 1*3 


4 A 



subject to: 


Vt+V 9 + V a >l 


Vi + K 2 -V 3 >l 


-Vi4 V*+V,>1 


This wiD result in a vote assignment of (0,1,0). 

With the above two foramlations we get a vote assignment which has best subset 
of the top k (20) groups getting majority of votes. But there might exist a number 
of vote aatflgmneni* possible which also can do the same. In that, case the above 
approaches will select the one with least number of total votes assigned. In the next 
formulation wc try to b elect the best one among the possible vote assignments. 

3.2.4 Formulation 3 

In this formulation, we use either formulation 1 or formulation 2, to find the subset 
(k f ) of the k groups for which it is possible to assign majority of votes, such that the 
sum of average transaction gain in these groups is maximum. The basic idea in this 
approach is that when the top 20 groups are unable to choose a vote assignment, we 
use one of the known heuristics to give weights to the nodes. Then while ensuring 
that the best subset k f out of k groups still get majority of votes, we try to assign 
votes such that the percentage of votes assigned to each node will be greater than 



or equal to to corresponding percentage of these weights. By this we are trying to 
give priority to individual nodes in the vote assignment. If it is not. possible to assign 
votes to all the nodes with this principle, then we will assign to the best subset of 
N, which is determined through the above weights. We use a modified version of 
heuristic proposed in [BQ 87] to get the weights. This will make sure that wc get 
majority of votes to the best subset among top 20 most likely groups and also it will 
hopefully give majority of votes to groups other than top 20 which also have high 
average transaction gain. 

Heuristic 1 

Weight assigned to node i («/{) = 100 * A; *p,- * Y2{pk * lj) for nil. j such that. lj ib 
the link between nodes i and k. 

Multiplication factor (100) is to reduce the error due to round off operation. 

This formulation is similar to the formulation 1, where we tried to select the best 
subset of k groups, here we rue the same idea to get the best subset of N nodes. X t 
is a biliary variable. X; will be aero if node i is assigned a percentage of votes greater 
than or equal to percentage of its weight w;. X; will be one otherwise. When it is 
not possible to assign voteB to all the nodes using the above principle, we shall try to 
get a best subset of the nodes such that sum of percentage of weights is maximum 
and aD the constraints getting majority of votes iu formulation 1 or formulation 2 
are still having majority of votes. Let TV be the total munber of votes assigned, by 
formulation J or 2. The integer programming model is 



Minimise: 


M 

53(t Ui*Xi) (3.8) 

«=] 

Subject to: 

E^> — 'y* ) + 1 ( < = 1 V) (3.9) 

ieo. * 

(3.10) 

where 

W=£>< (3.11) 

»=i 

V = p*TV (3.12) 

We take p snch that, p * TV < maastmum(41, N ) 
and (p-f 1 )*TV > maximum(4l,N). 

Since X » equals aero represents the i’th node getting percentage of -votes greater 
than or equal to to the percentage of weights assigned by Heuristic 1, 1; * X; will 
represent the weight of node getting percentage of votes less than its percentage of 
weight (u>,j. The objective function (eq. 3.8) is trying to minimise the sum of weights 
of nodes getting percentage of votes less than its percentage of weight («*) which is 
our aim. Since Vi represents the number of votes assigned to node i, the first term 
in constraint (eq. 3.9) represents the votes assigned to group i and the second term 
represent the majority of votes in the system. So the constraint (3.9) will make sure 
that all the k 1 groups will get majority of votes. The first term in constraint (eq. 3.10) 
jjjj£ represent fraction of weight assigned to node i and the second term represents 



th« fraction of votes assigned to node i. 


Case 1 When X, is scro constraint 3.10 will become: 


W 



,N) 


( 3 . 13 ) 


which makes sure that i 5 t.h node will have percentage of votes greater than or 
equal to the percentage of its weight assigned by heiuisticl. 


Case 2 When X; is one constraint, 3.30 becomes: 


Wi 


W 


Yi 

v 


<J (i = 


1 ,. 


,N) 


( 3 . 14 ) 


From the above equation, it is clear that when Xi is one constraint (3.10) may 
assign percentage of votes to node i such that it is greater than or equal to percentage 
of its weight. We have to make sure that whenever X; is one the i’tb node will not 
have the required percentage vole assignment. Since the objective function is trying 
to minimise tu,- * it will always try to assign all Aj’s as sero. By case 1, A, -■ 0 
means node i gets required percentage vote assignment. If it is not possible to assign 
required percentage of votes to all the nodes then some A, 3 b will have to take the 
value one. The objective function will select a subset out of the N nodes such that 
the sum of w, for these groups is maximum and assign the corresponding A', as sero. 
Constraint (3.10) will force these groups to get required percentage vote assignment. 
All the other nodes which have corresponding Xi as one doesn’t get the required vote 
assignment., since the objective function would have assigned Xi as aero if they can 
get the required percentage of votes. The value of total votes assigned through eqn. 



(oq. 3.12) will make bum that if nothing c»u. be <kwe then this formulation will ieewlt 
in the same mte assignment. to that of formulation 1 or 2. A* value of V increases, 
the time taken to assign votes will also increase. 

Example: 

Let ns consider the same network given in iornralatkm 1 and the weights of the 
nodes will be (29360,29160,29160). The above formulation will result in: 

Minimise: 

29 160 V, + 29160 + 29160V, 

Subject to: 

V, + y 3 + y 3 > i 

V,+V a -V»>l 


~Vi 4 V 2 + H > 1 


3V 1 4 123Xj > 41 


3V a 4 123-Xj ^ 41 


3l^j 4 123Jfj ^ 41 



Vj + V3 + V a ~41 


X lt X 2 ,X s = 0 or 1 


This will result in. a voteassignment of (14,14,13). This is a better vote assign - 
ment than obtai ne d through the other formuiatians. Since all the nodes are equally 
important, this vote assignment will result in better performance. This formulation 
has tried to get vote assignment similar to the one assigned by Heuristic 1 and it 
even makes aure that all the groups, out of k groups, getting majority of votes in 
other formulations are also getting majority of votes here. It is basically able to 
get a vote assignment such that the groups other then k' groups getting majority of 
votes have more average transaction gain in them than those obtained using the other 


formulations. 



Chapter 4 


IMPLEMENTATION OF VAT 


Here we describe the implementation details of VAT. Hub contains two major parts: 
sunnl ation and integer programming. First we explain the input, output formats of 
the VAT and then we describe the functions of the important modules present in both 
the parts. 



Figure Approach 


We assume that the mean time to repair (MTTIL) of each component is 240 time 



nxrilH. Mean time to failure (MTTF) is calculated using reliability of the component 


reli afaili tv — ICSC t^ 
reuaoaiuy — MTT¥ + UTrR 

The simulation is done for & period of 10000000 time units. 

4.1 Input format 

The input file contains the following information: 

• Format of the output (normal = 0, Verbose = l). 

» File containing information on nodes. 

• File containing information on linkB. 

A sample input file is shown below: 

1 

nodes.) 

links.) 

The input file containing information about nodes has triplets of the form: 



< nude#, rel, ieq.Ajj.rute > where: 
node# number of the node 

re) reliability of tbai node 


req_arr_raie request arrival rate at that node 


A data file on nodes (nodes. 1) in shown below: 


1 0.9 1.0 

2 0.9 1.0 

3 0.9 1.0 

The input file containing information about links ha* triplets of the form: 

< ndl, nd2, rel > where: 

ndl,nd2 The end poante of the corresponding link 
rel Reliability of the link 

Since the links are bidirectional there will be only one entry for a link between 
any pair of nodes. A data file on links (linkB.J) is shown below: 


1 2 0.9 

1 3 0.9 

2 3 0.9 



4.2 Output format 


Output is given in two modes. One of the two modes, normal or verbose, is selected 
in the input file. 


4.2.1 Norma] mode 

In the normal mode, the output will show the votes assigned by formulation 2 and 
then it will show the final vote assignment obtained using formulation 3. 

Far the above input data, the VAT’s output in this mode is: 


Vote assignment by 
FORMULATION 2 
VOTEp] ^ 0 
VOTE[2] = 1 


V0TE[3] = 0 




Final vote assignment 
FORMULATION 3 
V0TE[1] = 14 
VOTE[2] = 14 
V0TE[3] = 13 


4.2.2 Verbose mode 

In this mode, in addition to the vote assignment it will show the top k groups obtained 
through simulation and the average number of transactions gained in each of them. 
It will also show the groups getting majority of votes through the vote assignment. 

Fear the input file shown in 4.1, VAT’s output is: 


Top k groups 

Group Average transaction gain 
{1,2,3} 400 
{1,2} 30 
{2,3} 30 




Group* getting majority of votes 
Group Average transaction gain 
{1,2,3} 400 
{1,2} 30 
{2,3} 30 


Vote assignment using 

FORMULATION 2 
VOTEflJ =r 0 
VOTE|2] = 1 
VOTE[3] = 0 


Final vote assignment using 
FORMULATION 3 
VOTEflJ = 14 
VOTE[2] = 14 
VOTEJ3] = 13 


4.3 Simulation model 


'?^L Li”'" 

* : A V 

ise.Ho. ■■ 


We have used an event driven simulator model to find the most likely groups formed 
due to partition failures along with the average transaction gain in each of them, if 
they were assigned majority of votes. The simulation data is obtained during run 



tine. it. the next failure time of aoy nodi! is calculated t»ly after tie current failure 
of that node, anriLarly for tho other rvenie. With thin there » no need to store the 
past or future (Trent times, which reduces the memory requirement of the simulation 
program. The basic modules present here ait: 



Figure Simulation mod ties sad their interactions 


Initialiser: It initialises the network topology and gets the times at which failure of 
each component, occurs, for the first time using the given input data 

Siml_data_gen: This is used to get the time at which next ervent occurs at the same 
node /link. We have assumed that the node /link failure/ recovery times are exponen- 










tially distributed, the time at which next event occurs is calculated as 


T» = 2Ux + 

where 


7i_i = time at which, the event has occured. 

m_t — mean inter event time. 

k — paendo-random number between 0 and 1. 

Next-eve Jnd : This is used to find the time at which the next immediate global event, 
occurs. 

Partition Jinder: This module keeps track of the network topology. Whenever there 
is a failure/recovery, it modifies the network topology accordingly, finds the con- 
nected components and gets the top two groups present in the network by finding 
TttJGAIN-RATB of different groups. 

tr_gain_fnd: This module keeps track of the HPG/NHPG groups formed and finds 
the total fraction of time for which each group is present as HPG/NHPG. We kept 
track of the first 4000 HPG/NHPG groups formed during simulation. 

Simulator: This is the main module. It initially gets the input data for simulation 
and malttw a call to initialiser module to start the simulation. It uses NextjeveJhd 
module to get the time at which next event occurs and increments the emulation 
time to it. Whenever a failure/ recovery occurs, it makes a call to partitiomfinder 



module and ouch tr-gairufnd to lee.p I jack, of the groups. Finally, it gets tlie lop k 
group* obtained wring AVC-TR-GAIN of each of the groupe. 


A A Integer programming 

This module gets the top k (k — 20) groups along with the average transaction gain 
in each of the groups, if they were assigned majority of votes from simulation part. 
Its aim is to get the vote assignment such that the average transactions gained in 
the system is maximized. We use formulation 3 to get the vote assignment since 
it gives better vote assignment than the other two. It has three major modules: 
probJbrmulator, lP_routine, vote-assign. 

pxobJbrnmlator: It uses the groups given by the simulator module and formulates 
this as an integer programming problem, using cure of formulation 1 or 2 or 3. Here, 
we have used formulation 3, in which formulation 2 is used to find the k* groups which 
should get majority of votes. 

IP-routine: This uses Gammy's method with Wilson's cuts algorithm [WS 67] to 

solve the integer programming problem. 

vote-assign: This is the main module, in Ibis part. Initially it makes a call to 
probJomoroiaior module and gets the integer programming problem for formulation 
2. With this, it uses IP -routine to get the best subset out of k groups which will 
get majority of votes. Using these groups it again calls probJonunlator to gets the 
integer programming problem using formulation 3 and then calls IP-routine to get 



the final vote assignment. 


4.5 Random graph generator 

We have used the model suggested in [BW 88], where it is claimed that the graphs 
generated here will have characteristics of real time network*. We have implemented 
this model to get the random graphs and obtained the vote assignment using VAT. 

All the int eger programming formulations arc verified on the standard commer- 
cially available LiNear Discrete Optimise! (J/JNDO) package. 



Chapter 5 


EXPERIMENTS 


We have tried the vote assignment tool (VAT) for different types of networks. In 
this section, wc discuss the experiments done using VAT. As mentioned before, VAT 
considers the average number of transactions gain per unit time as the performance 
metric. It was found that in many of the cases VAT is assigning votes similar to those 
assigned by the best one of the existing heuristics. These heuristics have different 
metrics for performance evaluation. One of the metric used is steady state probability 
that the system is up. 


5.1 Experiment 1 

We have used VAT to find the vote assignment, foi the different network configu- 
rations given in [BG 67], [TK 111]. The following table shows the results. We have 
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calculated the best vote assignment done by the heuristics for those networks, the 
equivalent vote assignment with irrirrimum number of total votes assigned is obtained 
and it is given in the following table. (Vote assignments arc said to be equivalent if 
the groups getting majority of votes are same in all of them.) The table also gives 
the vote assignment with minimum total votes assigned and. which is equivalent to 
that of VAT’s assignment. 

Network configurations 

5 node fully connected network 

1. All components reliability = 0.6 

2. AH components reliability — 0.7 

3. AH components reliability = 0.8 

4. AH components reliability = 0.9 

5. AH components except node a, reliability = 0.9, node a reliability = 0.95 

6. AH components except node a, reliability ~ 0.9, node a reliability = 0.98 

7. AH components except node a, reliability — 0.9, node a reliability — 0.99 

8. AH components except node a, reliability = 0.9, node a reliability = 0.999 

9. Nodes a,b,c reliability - 0.9, rest - 0.5, linlm (a,b),(a,c)(b,c) rel — 0.9, rest 
= 0.5 

JO. Nodes a,b,c reliability = 0.9, rest — 0.5, links (a,b),(a,c)(b,c) rel = 0.9, rest 

- 0.8 


0>7 



11. Nodes *,b,c reliability _ 0.9, rest = 0.8, links (a,b),(a,c)(b,c) rel = 0.9, rest 

= 0.8 

12. Node a reliability = 0.99, rest = 0.8 all links = 0.9 

13. Node a reliability = 0.999 ^est = 0.8 all linlm = 0.8 

14. Node a, reliability = 0.999, rest = 0.85 all links = 0.85 
6 node fully connected network 

15. All component* reliability = 0.6 

16. Nodes a,b,c rel = 0.9, rest = 0.6, links among a,b,c = 0.9 rest = 0.6. 

17. Nodes a,b,c rel = 0.9, rest = 0.6, links among a,b,c = 0.9 rest = 0.5. 

18. Node a reliability = 0.9, rest = 0.6, Hnkx reliability = 0.6 

19. AD components reliability = 0.9 
Ncmfnlly connected 5-node system 

20. same as 6), but link (a,e) deleted. 

21. same as 20), but links (a,b) and (c,d) are deleted. 

22. same as 21), but links (c,e) and (b,d) are deleted. 

The vote assignment for the last two cases is given by the heuristic given in [TK 91] 
where it is assumed that the reliability obtained by their method will be compared 
with the reliabilty given by the singleton and nniiorm voting and then the best one is 
selected- This approach is not feasible for trig networks since finding the probability 
that a group of nodes which remain connected is NP-hard [AR 77]. 



Vote ae/rignment 

Cm e 

Beet assignment by 

VAT’s 

VAT’s equivalent 


the henriatics 


minimal assignment 

i 

(1,0,0, 0,0) 

(9,8, 8, 8, 8) 

( 1 , 1 , 1 , i,i) 

2 

(i,i,i,i,i) 

(9,8, 8, 8,8) 

(i,i, l.i.i) 

3 

( 1| 1 

(9,8, 8, 8,8) 

(1,1, 1,1,1) 

4 

(1,1, 1,1,1) 

(9,8, 8, 8, 8) 

(i,i, i,i,i) 

5 

(1,1, 1,1,1) 

(9, 8, 8,8, 8) 

(i,i, i,i,i) 

6 

(1, 1,1,1, 1) 

(9, 8, 8, 8, 8) 

(i,i, i,i,i) 

7 

(3, 1,1,1, 1) 

(9,8, 1,8, 9) 

(2,1,1, 1,2) 

8 

(i,i, 1 , 1 , 1 ) 

(17,1,7,1,9) 

(3, 1,1, 1,1) 

9 

(1,1,1, 0,0) 

(10,10,11,4,4) 

(1,1, 1,0,0) 

30 

(1,1, 1,0,0) 

(14,9,11,5,0) 

(1,1, 1,0,0) 

11 

(1,1, 1,1,1) 

(9,9, 9, 7, 7) 

(1,1, 1,1,1) 

12 

(3, 1,1, 1,1) 

(17,7,1,9,1) 

(3,1,1, 1,1) 

13 

(1,0,0, 0,0) 

(25,8,0,0,8) 

(1,0, 0,0,0) 

14 

(1,0, 0,0,0) 

(25,0,8,8,0) 

(1,0, 0,0,0) 

15 

(1,0, 0,0, 0,0) 

(7, 7, 7, 7, 7, 6) 

(1,1, 1,1, 1,0) 

16 

(2,2, 2 , 1,1,1) 

(8,8, 8, 5, 5, 5) 

(2, 2 , 2 , 1,1,1) 

37 

(1,1, 1,0, 0,0) 

(9,9, 9, 4,4, 4) 

(2, 2 , 2 , 1,1,1) 

18 

(1,0, 0,0, 0,0) 

(1,0, 0,0, 0,0) 

(1,0, 0,0, 0,0) 

19 

(0,1, 1,1, 1,1) 

(6,74,7,7,7) 

(0,1, 1,1, 1,1) 

20 

(1,1, 1,1,1) 

(7,9, 9, 9, 7) 

( 14 , 1 , 1 , 1 ) 

21 

(1, 0,0,0, 0) 

(6,8, 9, 9, 9) 

( 1 , 1444 ) 

22 

(1,0,0, 0,0) 

(16,7,8,8,0) 

(2,1,14,0) 


Table Vote asignments 




It was oltervod that in many of the cased Hie vote wngotueni done, by VAT in 
equivalent to that of the beat assignment given by the hernia tics. Since the metric 
nacd for performance evaluation is different, in some of the cases the vote assignment 
done by VAT is different to that of the best vote assignment done by the heuristics. 


5.2 Experiment 2 

Here we have obtained the random graphs using model suggested in [BW 88]. From 
this model, we get the topology of the network. The reliabilities of each of the com- 
ponents is assigned such that all the reiiabilitieu are greater than 0.5. The networks 
for different connectivities and fur different number of nodes axe obtained using the 
above model. We have obtained vote assignment fur these networks using VAT. 

In all the existing heuristics the number of votes assigned to each node depends 
upon the reliability of the neighbouring nodes and the reliability of the links incident 
on it. Bnt there exist cases in which the priority of the node should be determined 
not only on the local information but also on bow well it is connected to all the other 
nodes and reliability of all the nodes. Let ns consider the following network. 

• All nodes reliability, except node 6 — 0.9 

• Node 6 reliability — 0.6 

• Reliability of links (5, 6), (6, 7), (6, 10) = 0.9 


• Reliability of other links — 0.95 



All the existing heuristics will give equal priority to the nodes I, 8 am! 11, even- 
though node 1 in present in a better partition. Our tod determines through (ri muta- 
tion, that most likely groups axe {1,2, 3, 4, 5], {7,8,9} and {10,11,12}. Since {1,2, 3, 4, 5] 
is a better group it assigns majority of votes to this group. 

Evaluation of vote assignments is not feasible for big networks. So, for big net- 
works, it is not feasible to compare the performance offered by VAT's vote assignment 
with other heuristics. Since global information will be more important in these type of 
networks, VAT’s vote assignment will give better performance than the other heuris- 
tics. 
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Chapter 6 


CONCLUSIONS 


Voting is a well known technique tor controlling access to replicated data. The votes 
assigned to each of the nodes can have a remarked effect on the system performance. 
In tins report, we have discussed an integer programming approach to salve the vote 
assignment problem using the average transaction gain per unit time as the perfor- 
mance metric. In this approach, first we rue monte-carlo simulation to determine the 
priori ty groups and then using these groups the vote assignment problem is formulated 
as an integer programming problem. The entire procedure has been implemented in 
the Vote Assignment Tool (VAT). 

Unlike the existing henrsi sties, our approach uses global information in deciding 
voting pattern ie. the votes assigned to each of the nodes depend upon how well it is 
connected to all the other nodes and reliability of aJl the nodes. This has resulted in 
a better voting pattern. But the main drawback is as the number of nodes increases, 
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the overhead involved also iucreawv. We have tried the too). oil the tent case* given 
in [BG 87], f) v K 9)]. In most of the cams it w Resigning votes eimilar to the beet one 
of the heuristics. We have also used this tool on the random graphs generated by the 
model suggested in [BW 88], to determine the vote assignments. 
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